34 research outputs found
Guided Co-training for Large-Scale Multi-View Spectral Clustering
In many real-world applications, we have access to multiple views of the
data, each of which characterizes the data from a distinct aspect. Several
previous algorithms have demonstrated that one can achieve better clustering
accuracy by integrating information from all views appropriately than using
only an individual view. Owing to the effectiveness of spectral clustering,
many multi-view clustering methods are based on it. Unfortunately, they have
limited applicability to large-scale data due to the high computational
complexity of spectral clustering. In this work, we propose a novel multi-view
spectral clustering method for large-scale data. Our approach is structured
under the guided co-training scheme to fuse distinct views, and uses the
sampling technique to accelerate spectral clustering. More specifically, we
first select () landmark points and then approximate the
eigen-decomposition accordingly. The augmented view, which is essential to
guided co-training process, can then be quickly determined by our method. The
proposed algorithm scales linearly with the number of given data. Extensive
experiments have been performed and the results support the advantage of our
method for handling the large-scale multi-view situation
Pixel-wise Deep Learning for Contour Detection
We address the problem of contour detection via per-pixel classifications of
edge point. To facilitate the process, the proposed approach leverages with
DenseNet, an efficient implementation of multiscale convolutional neural
networks (CNNs), to extract an informative feature vector for each pixel and
uses an SVM classifier to accomplish contour detection. In the experiment of
contour detection, we look into the effectiveness of combining per-pixel
features from different CNN layers and verify their performance on BSDS500.Comment: 2 pages. arXiv admin note: substantial text overlap with
arXiv:1412.685
C2S2: Cost-aware Channel Sparse Selection for Progressive Network Pruning
This paper describes a channel-selection approach for simplifying deep neural
networks. Specifically, we propose a new type of generic network layer, called
pruning layer, to seamlessly augment a given pre-trained model for compression.
Each pruning layer, comprising depth-wise kernels, is represented
with a dual format: one is real-valued and the other is binary. The former
enables a two-phase optimization process of network pruning to operate with an
end-to-end differentiable network, and the latter yields the mask information
for channel selection. Our method progressively performs the pruning task
layer-wise, and achieves channel selection according to a sparsity criterion to
favor pruning more channels. We also develop a cost-aware mechanism to prevent
the compression from sacrificing the expected network performance. Our results
for compressing several benchmark deep networks on image classification and
semantic segmentation are comparable to those by state-of-the-art
Contour Detection Using Cost-Sensitive Convolutional Neural Networks
We address the problem of contour detection via per-pixel classifications of
edge point. To facilitate the process, the proposed approach leverages with
DenseNet, an efficient implementation of multiscale convolutional neural
networks (CNNs), to extract an informative feature vector for each pixel and
uses an SVM classifier to accomplish contour detection. The main challenge lies
in adapting a pre-trained per-image CNN model for yielding per-pixel image
features. We propose to base on the DenseNet architecture to achieve pixelwise
fine-tuning and then consider a cost-sensitive strategy to further improve the
learning with a small dataset of edge and non-edge image patches. In the
experiment of contour detection, we look into the effectiveness of combining
per-pixel features from different CNN layers and obtain comparable performances
to the state-of-the-art on BSDS500.Comment: 9 pages, 3 figure
Non-local RoI for Cross-Object Perception
We present a generic and flexible module that encodes region proposals by
both their intrinsic features and the extrinsic correlations to the others. The
proposed non-local region of interest (NL-RoI) can be seamlessly adapted into
different generalized R-CNN architectures to better address various perception
tasks. Observe that existing techniques from R-CNN treat RoIs independently and
perform the prediction solely based on image features within each region
proposal. However, the pairwise relationships between proposals could further
provide useful information for detection and segmentation. NL-RoI is thus
formulated to enrich each RoI representation with the information from all
other RoIs, and yield a simple, low-cost, yet effective module for region-based
convolutional networks. Our experimental results show that NL-RoI can improve
the performance of Faster/Mask R-CNN for object detection and instance
segmentation.Comment: NIPS 2018 Workshop on Relational Representation Learning. arXiv admin
note: substantial text overlap with arXiv:1807.0536
Trust-Region Methods for Real-Time Tracking
Optimization methods based on iterative schemes can be divided into two classes: linesearch methods and trustregion methods. While linesearch techniques are commonly found in various vision applications, not much attention is paid to trust-region methods. Motivated by the fact that linesearch methods can be considered as special cases of trust-region methods, we propose to apply trust-region methods to visual tracking problems
Cube Padding for Weakly-Supervised Saliency Prediction in 360{\deg} Videos
Automatic saliency prediction in 360{\deg} videos is critical for viewpoint
guidance applications (e.g., Facebook 360 Guide). We propose a spatial-temporal
network which is (1) weakly-supervised trained and (2) tailor-made for
360{\deg} viewing sphere. Note that most existing methods are less scalable
since they rely on annotated saliency map for training. Most importantly, they
convert 360{\deg} sphere to 2D images (e.g., a single equirectangular image or
multiple separate Normal Field-of-View (NFoV) images) which introduces
distortion and image boundaries. In contrast, we propose a simple and effective
Cube Padding (CP) technique as follows. Firstly, we render the 360{\deg} view
on six faces of a cube using perspective projection. Thus, it introduces very
little distortion. Then, we concatenate all six faces while utilizing the
connectivity between faces on the cube for image padding (i.e., Cube Padding)
in convolution, pooling, convolutional LSTM layers. In this way, CP introduces
no image boundary while being applicable to almost all Convolutional Neural
Network (CNN) structures. To evaluate our method, we propose Wild-360, a new
360{\deg} video saliency dataset, containing challenging videos with saliency
heatmap annotations. In experiments, our method outperforms baseline methods in
both speed and quality.Comment: CVPR 201
One-Shot Object Detection with Co-Attention and Co-Excitation
This paper aims to tackle the challenging problem of one-shot object
detection. Given a query image patch whose class label is not included in the
training data, the goal of the task is to detect all instances of the same
class in a target image. To this end, we develop a novel {\em co-attention and
co-excitation} (CoAE) framework that makes contributions in three key technical
aspects. First, we propose to use the non-local operation to explore the
co-attention embodied in each query-target pair and yield region proposals
accounting for the one-shot situation. Second, we formulate a
squeeze-and-co-excitation scheme that can adaptively emphasize correlated
feature channels to help uncover relevant proposals and eventually the target
objects. Third, we design a margin-based ranking loss for implicitly learning a
metric to predict the similarity of a region proposal to the underlying query,
no matter its class label is seen or unseen in training. The resulting model is
therefore a two-stage detector that yields a strong baseline on both VOC and
MS-COCO under one-shot setting of detecting objects from both seen and
never-seen classes. Codes are available at
https://github.com/timy90022/One-Shot-Object-Detection.Comment: NeurIPS 201
Learning Gaussian Instance Segmentation in Point Clouds
This paper presents a novel method for instance segmentation of 3D point
clouds. The proposed method is called Gaussian Instance Center Network (GICN),
which can approximate the distributions of instance centers scattered in the
whole scene as Gaussian center heatmaps. Based on the predicted heatmaps, a
small number of center candidates can be easily selected for the subsequent
predictions with efficiency, including i) predicting the instance size of each
center to decide a range for extracting features, ii) generating bounding boxes
for centers, and iii) producing the final instance masks. GICN is a
single-stage, anchor-free, and end-to-end architecture that is easy to train
and efficient to perform inference. Benefited from the center-dictated
mechanism with adaptive instance size selection, our method achieves
state-of-the-art performance in the task of 3D instance segmentation on ScanNet
and S3DIS datasets
Non-local RoIs for Instance Segmentation
We introduce the concept of Non-Local RoI (NL-RoI) Block as a generic and
flexible module that can be seamlessly adapted into different Mask R-CNN heads
for various tasks. Mask R-CNN treats RoIs (Regions of Interest) independently
and performs the prediction based on individual object bounding boxes. However,
the correlation between objects may provide useful information for detection
and segmentation. The proposed NL-RoI Block enables each RoI to refer to all
other RoIs' information, and results in a simple, low-cost but effective
module. Our experimental results show that generalizations with NL-RoI Blocks
can improve the performance of Mask R-CNN for instance segmentation on the
Robust Vision Challenge benchmarks.Comment: Robust Vision Challenge 201